Classification of Landsat TM Data
Classification refers to several statistical techniques
(or algorithms) used to sort and group data into discrete classes
which can be uniquely identified. Classification algorithms are
routinely used to reduce the large volume of data present in a
typical TM (or other sensor) dataset to several classes that are
meaningful to the investigator. There are two major types of
classification algorithm applied to remotely sensed data:
unsupervised and supervised. Unsupervised classification algorithms
(such as ISODATA) cluster data according to several user-defined
statistical parameters in an iterative fashion until either some
percentage of pixels remain unchanged or a maximum number of
iterations has been performed. This method of classification is most
useful when no previous knowledge or ground truth data of an area is
available. The classes determined by the algorithm still require
land cover identification by an experienced analyst however, which
can be a significant disadvantage to the method.
The unsupervised classification technique is used for
portions of the CAP LTER region for which limited ancillary data
(land use, aerial photographs, zoning maps) are available
(essentially the region outside of Maricopa County). Twenty (or
more) classes are determined for each TM scene using the ISODATA
algorithm, and these classes are assigned a land cover category on
the basis of spectral signature, vegetation density, and geologic
setting. These land cover classifications are considered to be
preliminary pending detailed verification by ground
truthing.
Supervised classification algorithms rely on user-defined
training regions that represent pure samples of a particular class
(such as asphalt ). Several different types of supervised
classification algorithms exist, but the major types are minimum
distance, parallelpiped, and maximum likelihood. Minimum distance
algorithms calculate a mean value in spectral space for each
training region, and then compare each image pixel value to these
means. The image pixel is assigned to whatever class mean it is
closest in value to. The parallelpiped algorithm constructs a class
volume in data space to further constrain the identification of data
points as a given class. Maximum likelihood algorithms assume a
Gaussian distribution of pixel values within each training class and
tend to be somewhat more accurate in regions of high surficial
variability. Image pixels that fall within some standard deviation
of the training class mean are assigned to that class. This method
has the added advantage of weighting such that image pixels are less
likely to be classified as covers with low probability of occurrence
in the scene. Maximum likelihood supervised classification is used
for regions that have good ground truth data available, such as the
Phoenix metro region.
Verification of Classification Accuracy
The accuracy of any classification must be assessed prior
to use in scientific analysis. Accuracy assessment involves the
collection of ground truth data for the classified scene. This is
done by establishing a number of test pixels within the image for
which the actual ground cover is determined by field inspection, use
of aerial photographs, or use of some other dataset. An overall
classification accuracy is then determined by dividing the number of
test pixels correctly classified by the number of total test
pixels.
Assessment of accuracy for the CAP LTER study area is not
straightforward as the major dataset available for comparison is the
Maricopa Association of Governments (MAG) Land Use Map. Land use can
be thought of as the purpose to which a particular area is devoted,
such industrial or recreational use. Use of the MAG dataset for
comparison with classified TM data requires interpretation of the
land cover types in terms of possible land uses, which is
complicated as several land covers may be associated with the same
land use category. For example, the institutional MAG land use
category corresponds to the asphalt+concrete+soil+/-metal roofs,
asphalt+concrete+soil+/-metal roofs+/-grass, and
concrete+grass+/-woody veg.+/-asphalt land cover classes. Land use
is the primary data format used by the various LTER researchers, so
a fairly simple categorization scheme was defined for use in LTER
studies. Overall accuracy of the TM classification is 71%, a value
which is typical for TM data.